FASTUS: A Finite-state Processor for Information Extraction from Real-world Text
نویسندگان
چکیده
Approaches to text processing that rely on parsing the text with a context-free grammar tend to be slow and error-prone because of the massive ambiguity of long sentences. In contrast, FASTUS employs a nondeterministic finite-state language model that produces a phrasal decomposition of a sentence into noun groups, verb groups and particles. Another finite-state machine recognizes domain-specific phrases based on combinations of the heads of the constituents found in the first pass. FASTUS has been evaluated on several blind tests that demonstrate that state-of-the-art performance on information-extraction tasks is obtainable with surprisingly little computational effort.
منابع مشابه
Sri International: Description of the Fastus System Used for Muc-3
FASTUS is a (slightly permuted) acronym for Finite State Automaton Text Understanding System. It is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications. It works essentially as a cascaded, nondeterministic finite state automaton. It is an information extraction system, rather th...
متن کاملSRI International: description of the FASTUS system used for MUC-4
FASTUS is a (slightly permuted) acronym for Finite State Automaton Text Understanding System . It is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications . It works essentially as a cascaded , nondeterministic finite state automaton . It is an information extraction system, rathe...
متن کاملSRI : Description of the JV - FASTUS System Used for MUC - 5 Douglas
INTRODUCTION AND BACKGROUND SRI International developed an information extraction system called FASTUS 1 , a permuted acronym standing for \Finite State Automata-based Text Understanding System. The choice of acronym is somewhat misleading, however, because FASTUS is a system for information extraction, not text understanding. The former problem is much simpler and more tractable, characterized...
متن کاملThe SRI TIPSTER II Project
SRI participated in the Architecture Working Group (AWG) meetings and aided in the design, testing, and implementation of the TIPSTER document manager architecture. Their contributions concerned input on the nature of basic entities, such as documents and text segments, and ways of communicating information from extraction modules to other modules in order to allow extraction and detection modu...
متن کاملSRI: description of the JV-FASTUS system used for MUC-5
INTRODUCTION AND BACKGROUND SRI International developed an information extraction system called FASTUS1 , a permuted acronym standing for "Finite State Automata-based Text Understanding System. The choice of acronym is somewhat misleading, however, because FASTUS is a system for information extraction, not text understanding. The former problem is much simpler and more tractable, characterized ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993